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DETAILED ACTION 
Claim Rejections - 35 USC§103 
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

1. Claims 1-8, 10-16, 32-33, and 35-36 are rejected under 35 U.S.C. 103(a) as being . 
unpatentable over Mitchell (US Patent No. 5,799,273) in view of Roth et al (US Patent 
Application Publication No. 2005/0038657). 

2. Regarding claims 1 and 35, Mitchell discloses a system for relating words in an audio file 
to words in a text file, comprising: retrieving a text file comprising a plurality of textual words 
(col. 6, lines 20-29); generating an audio file comprising a plurality of audible words based on 
the text file (col. 6, lines 9-19); storing information relating each audible word to a 
corresponding textual word (col. 6, lines 48-65); and an electronic marker that indicates the 
position of the audible word within the text file (abstract; col. 9, lines 13-25 in which Mitchell 
discloses the user can delete and/or insert text and the recognition interface updates the links 
between the recognized word and the associated audio components such that link data is 
amended to indicate the correct character position of the word in the text). 

Mitchell does not disclose that the electronic marker is within the audio file, however it 
would have been obvious to one of ordinary skill at the time of the invention to provide for the 
electronic marker embedded in the audio file that indicates the position of the audible word 
within the text file so as to aid the user in reviewing the text as the audio is output. 
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Mitchell does not teach the audio file is generated by converting the textual words to a 
plurality of audible words, or that the audio file is transmitted or available to a user of a 
telecommunications device. Roth discloses a combined speech recognition and text-to-speech 
system for use in a cellular telephone, in which text-to-speech (TTS) generation is used in 
conjunction with large vocabulary speech recognition to say words selected by the speech 
recognizer. TTS or recorded audio can be used to say both recognized text and the names of 
recognized commands after their recognition. The TTS can repeat text recognized by the 
speech recognition after each of a succession of end of utterance detections. A user can move a 
cursor back or forward in recognized text, and the TTS can speak one or more words at the 
cursor location after each such move. The speech recognition can be used to produces a choice 
list of possible recognition candidates and the TTS can be used to provide spoken output of one 
or more of the candidates on the choice list. Roth suggests that such a system provides for an 
effective large- vocabulary speech recognition that is used on portable computers that is capable 
of providing a user interface that makes it easier and faster to create, edit, and use speech 
recognition on such devices. 

It would have been obvious to one of ordinary skill at the time of the invention to modify 
the system Mitchell to provide the linked audio and text data to users of a plurality of 
computing devices, as suggested by Roth for the purpose of providing an effective large- 
vocabulary speech recognition that is used on portable computers that is capable of providing a 
user interface that makes it easier and faster to create, edit, and use speech recognition on such 
devices, as also suggested by Roth. 
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Regarding claim 2, Mitchell discloses the textual words comprise ASCII text (col. 5, lines 

59-67). 

Regarding claim 3, Mitchell discloses the audio file is stored in the form of a WAV file 
(col. 6, lines 9-29; col. 13, lines 26-30). 

Regarding claim 4, Mitchell discloses the information comprises voice tags embedded in 
the audio file (col. 7, lines 1-30). 

Regarding claim 5, Mitchell discloses the information comprises a file map relating a 
location of each textual word within the text file to a location of the corresponding audible word 
in the audio file (col. 6, line 48 to col. 8, line 3). 

Regarding claims 6 and 36, Mitchell discloses the method steps are performed by login 
embodied in a computer readable medium (col. 4, line 66 to col. 5, line 36). 

Regarding claims 7, 15, and 32, Mitchell discloses a system for relating words in an 
audio file to words a text file, comprising: retrieving a text file comprising a textual word (col. 6, 
lines 20-29); generating an audible word corresponding the textual word (col. 6, lines 9-19); 
storing the audible word in an audio (col. 6, lines 9-29; col. 13, lines 26-30); storing a file map, 
the file map comprising: a first location locating audible word within the audio file (Figures 3-4; 
col. 6, line 48 to col. 7, line 30); and a second location locating the textual word within the text 
file (Figures 3-4; col. 6, line 48 to col. 7, line 30). 

Mitchell does not teach the audio file is generated by converting the textual words to a 
plurality of audible words, or that the audio file is transmitted or available to a user of a 
telecommunications device. Roth discloses a combined speech recognition and text-to-speech 
system for use in a cellular telephone, in which text-to-speech (TTS) generation is used in 
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conjunction with large vocabulary speech recognition to say words selected by the speech 
recognizer. TTS or recorded audio can be used to say both recognized text and the names of 
recognized commands after their recognition. The TTS can repeat text recognized by the 
speech recognition after each of a succession of end of utterance detections. A user can move a 
cursor back or forward in recognized text, and the TTS can speak one or more words at the 
cursor location after each such move. The speech recognition can be used to produces a choice 
list of possible recognition candidates and the TTS can be used to provide spoken output of one 
or more of the candidates on the choice list. Roth suggests that such a system provides for an 
effective large-vocabulary speech recognition that is used on portable computers that is capable 
of providing a user interface that makes it easier and faster to create, edit, and use speech 
recognition on such devices. 

It would have been obvious to one of ordinary skill at the time of the invention to modify 
the system Mitchell to provide the linked audio and text data to users of a plurality of computing 
devices, as suggested by Roth for the purpose of providing an effective large- vocabulary speech 
recognition that is used on portable computers that is capable of providing a user interface that 
makes it easier and faster to create, edit, and use speech recognition on such devices, as also 
suggested by Roth. 

Regarding claims 8, 16, and 33, Mitchell discloses repeating the steps the method 
plurality of textual words in the text file (col. 5, line 59 to col. 8, line 3; Figures 3-4). 

Regarding claim 10, Mitchell discloses a system for relating words in an audio file to 
words in a text file, comprising: retrieving a text file comprising a plurality of textual words (col. 
6, lines 20-29); generating an audible word corresponding to each textual word (col. 6, lines 9- 
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29); and playing the audible words to a user in real time as the audible words are generated (col. 
8, line 52 to 10, line 2); and during the playing of the audible words, determining a current 
textual word corresponding audible word currently being played (col. 8, line 52 to col. 10, line 
2). 

Mitchell does not teach the audio file is generated by converting the textual words to a 
plurality of audible words with each audible word comprising media stream packets, or that the 
audio file is transmitted or available to a user of a telecommunications device. Roth discloses a 
combined speech recognition and text-to-speech system for use in a cellular telephone, in which 
text-to-speech (TTS) generation is used in conjunction with large vocabulary speech recognition 
to say words selected by the speech recognizer. TTS or recorded audio can be used to say both 
recognized text and the names of recognized commands after their recognition. The TTS can 
repeat text recognized by the speech recognition after each of a succession of end of utterance 
detections. A user can move a cursor back or forward in recognized text, and the TTS can 
speak one or more words at the cursor location after each such move. The speech recognition 
can be used to produces a choice list of possible recognition candidates and the TTS can be used 
to provide spoken output of one or more of the candidates on the choice list. Roth suggests that 
such a system provides for an effective large- vocabulary speech recognition that is used on 
portable computers that is capable of providing a user interface that makes it easier and faster to 
create, edit, and use speech recognition on such devices. 

It would have been obvious to one of ordinary skill at the time of the invention to modify 
the system Mitchell to provide the linked audio and text data to users of a plurality of computing 
devices, as suggested by Roth for the purpose of providing an effective large-vocabulary speech 
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recognition that is used on portable computers that is capable of providing a user interface that 
makes it easier and faster to create, edit, and use speech recognition on such devices, as also 
suggested by Roth. 

Regarding claim 1 1, Mitchell discloses the textual words comprise ASCII text (col 5, 
lines 59-67). 

Regarding claim 12, Mitchell discloses initializing a counter identifying textual words 
within the text file (col. 6, line 48 to col. 7, line 30); and incrementing the counter after each 
audible word is played (col. 6, line 48 to col. 7, line 30); wherein the step of determining 
comprises identifying the current textual word using the counter (col. 6, line 48 to col. 7, line 
30). 

Regarding claim 13, Mitchell discloses storing information about the audible word, the 
information comprising: an identifier for the textual word corresponding the audible word (col. 
6, line 48 to col. 8, line 3); and a time at which the audible word was played (col. 6, line 48 to 
col. 8, line 3; Figures 3-4). 

Regarding claim 14, Mitchell discloses the method steps are performed by login 
embodied in a computer readable medium (col. 4, line 66 to col. 5, line 36). 

5. Claims 9, 17-31, 34 and 37 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Mitchell in view of Dionne (US Patent No. 6,068,487) and further in view of Frulla et al 
(US Patent No. 6,424,357). 
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6. Regarding claims 9, 17-31, 34 and 37, Mitchell discloses a system which provides a user 
interface for relating words in an audio file to words a text file, comprising: retrieving a text file 
comprising a textual word (col. 6, lines 20-29); generating an audible word corresponding the 
textual word (col. 6, lines 9-19); storing the audible word in an audio (col. 6, lines 9-29; col. 13, 
lines 26-30); storing a file map, the file map comprising: a first location locating audible word 
within the audio file (Figures 3-4; col. 6, line 48 to col. 7, line 30); and a second location locating 
the textual word within the text file (Figures 3-4; col. 6, line 48 to col. 7, line 30). 

Mitchell does not teach that the system identifies an audible word to be spelled in 
response to the command to spell; identifies a textual word in a text file corresponding to the 
audible word to be spelled; and audibly spell the textual word. Dionne teaches a method for 
having a reading machine spell a word, which includes retrieving a word to be spelled, 
displaying letters of the word, spelling the word and provide an text-to-speech output of the word 
(col. 3, lines 8-34). Dionne teaches that the system is useful in assisting individuals with 
learning disabilities or severe visual impairments. 

It would have been obvious to one of ordinary skill at the time of the invention to modify 
the system of Mitchell to provide the spelling of words in the text, to aid in the editing of 
recognized text and in the correcting of recognition errors, for the purpose of assisting 
. individuals with visual impairments with editing of text. 

Mitchell and Dionne do not teach the command input to the system is via a voice 
command. However, implementation of voice commands to allow for system functionality and 
control similar to that of hand-controlled input devices was well known in the art. 
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Frulla discloses a voice input system has a microphone coupled to a computing device, 
with the computing device typically operating a computer software application. A user speaks 
voice commands into the microphone, with the computing device operating a voice command 
module that interprets the voice command and causes the graphical or non-graphical application 
to be commanded and controlled consistent with the use of a physical mouse (Figures 2-3; col. 
4, lines 57-64), and specifically teaches the system is advantageous in environments in which it 
is inconvenient or impractical to use a mouse, and thereby making the user interface more 
convenient and efficient for a user to input information and commands. 

It would have been obvious to one of ordinary skill at the time of the invention to modify 
the system of Mitchell to provide the spelling of words in the text, to aid in the editing of 
recognized text and in the correcting of recognition errors and to further provide voice command 
control, as suggested by Frulla, for the purpose of making the user interface more convenient and 
efficient for a user to input information and commands in situations in which using a physical 
mouse is impractical or cumbersome. 

Response to Arguments 
7. Applicant's arguments with respect to claims 1-8, 10-16, 32-33, and 35-36 have been 
considered but are moot in view of the new ground(s) of rejection. 

Regarding Applicant's request for an indication as to which provisional application 
provides support for the subject matter in Roth, the provisional applications identified within the 
Roth document all provide support for an audio file that is generated by converting textual 
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words to a plurality of audible words and that the audio is transmitted or available to a user of a 
telecommunications device (cellular telephone). 

8. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Angela A. Armstrong whose telephone number is 571-272-7598. 
The examiner can normally be reached on Monday-Thursday 1 1 :30-8:00 PM. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David Hudspeth can be reached on 571-272-7843. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would 
like assistance from a USPTO Customer Service Representative or access to the automated 
information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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Primary Examiner 
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